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1 Introduction 

Nonlinear dynamical systems often display complex behavior. In this lecture 
I shall review the behavior of stochastically perturbed dynamical systems, 
which is a field of its own. I shall use this as an opportunity to discuss 
applications to computer science, though applications to statistical physics, 
chemical physics, and elsewhere in the sciences are also numerous. 

If a deterministic dynamical system has an attractor, by definition the 
system state approaches the attractor in the long-time limit. But if the 
system is regularly subjected to small stochastic fluctuations (random kicks, 
or noise) this approach will only be approximate. In the long-time limit the 
system state will typically be specified by a probability distribution (a 'noisy 
attractor') centered on the attractor proper. In the limit as the noise strength 
tends to zero, this distribution will converge to the attractor. 

Even if the system has a single globally stable point as its only attractor, 
one can pose an interesting question: What is the probability, if the noise 
strength is very small, of finding the system in a specified state macroscopi- 
cally distant from the attractor? How long must one wait before this occurs? 
If the system has more than a single stable state, each with its own basin 
of attraction, one can similarly ask for the timescale on which transitions 
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between the two basins occur. Such questions are really questions about the 
character of the extreme tail of the noisy attractor, and can be answered 
only by quantifying the probability of large fluctuations of the system. The 
mathematical field dealing with such matters is known as large deviation 
theory §, |§. 

In scientific applications one would usually like to know not only how fre- 
quently atypical fluctuations occur, but also along which trajectory the sys- 
tem state moves during transitions from one stable state to another. It turns 
out that in most stochastically perturbed dynamical systems a single trajec- 
tory in the system state space, or at most a discrete set, is singled out in the 
limit of weak noise as by far the most likely. 

This phenomenon has long been known to chemical and statistical physi- 
cists, but its importance in other fields which make use of stochastic mod- 
elling, such as ecology and evolutionary biology, has only recently become 
clear || |20|| . In chemical physics the most likely transition trajectory is in- 
terpreted as a reaction pathway, since chemical reactions are modelled as 
transitions from a metastable state to a more stable state |25[. But the 



mathematical approach I shall sketch is much more general: the dynamical 
system can be continuous or discrete, and the system dynamics need not obey 
detailed balance. Some of the strongest results on systems without detailed 
balance have only recently been obtained [[14], The system can even be 
distributed, with nontrivial spatial extent; this includes stochastic cellular 
automata, and systems specified by stochastic partial differential equations 
rather than stochastic ordinary differential equations. 

The quasi-deterministic phenomena (optimal trajectories, well-defined re- 
action pathways, etc.) which arise in stochastically perturbed dynamical 
systems can be viewed as emergent. They are determined by the stochastic 
dynamics, but in a rather complicated way, and they manifest themselves 
only in the weak- noise limit. Their appearance in computer science applica- 
tions is not well known; I hope the two examples treated in this lecture will 
correct that. Attempts have recently been made to interpret the behavior of 
computers, or interacting networks of computers, in dynamical system terms 
or even ecological terms [0]. But stochasticity is, I think, a crucial part of 
any such interpretation. 
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2 A Simple Stochastic Model: ALOHAnet 



As a first example drawn from computer science, consider a stochastic model 
which attempts to capture the essential features of a large number of com- 
puters communicating with each other across a data network, such as an 
Ethernet. The model will be idealized, but it will be typical of ("in the same 
universality class as" ) models in which a large number of agents share occa- 
sional access to a single resource. Here the resource will be the network bus: 
the ether, which only one computer can use at a time. 

You are no doubt familiar with such application programs as telnet and 
ftp, which allow a user of one machine to communicate with another. Behind 
the scenes ("at a lower protocol layer," in telecommunications jargon) these 



programs work as follows |24|. A connection between two computers consists 
of a stream of data packets, each typically containing between 10 and 10 3 
bytes. (A data packet is simply a train of square waves.) An interactive login 
program like telnet normally transmits a packet whenever the user presses 
a key; the packet contains the typed character. Less interactive programs like 
ftp, which transfers whole files, employ larger packets. There is a scheme 
known as TCP/IP (Transmission Control Protocol/Internet Protocol) for 
specifying the destination of packets, and for keeping the two communicat- 
ing computers synchronized. This last task may involve the transmission of 
additional packets. 

Let us suppose that a computer is making substantial use of the network: 
several users are running ftp simultaneously, for example. In such a situation 
a statistical treatment is possible. In the context of a particular stochastic 
model, it is possible to estimate mean network usage, and the probability 
that data packets are transmitted successfully. That is what I shall now do. 

A slight digression is necessary on the issue of successful transmission. 
Ethernet, besides being a tradename, is a multiaccess protocol: a scheme 
for sharing access to the cable connecting two or more computers. Normally 
when a computer wishes to transmit a packet, it does so immediately. It is 
possible therefore for two machines to transmit colliding packets, in which 
case both packets are corrupted: the information in both is lost. The Ether- 
net protocol (a CSMA/CD [Carrier Sense Multiple Access/Collision Detect] 
protocol) embodies a heuristic for minimizing the probability of collisions, 
i.e., of unsucessful transmissions. 

A description of the protocol may be found in the book of Bertsekas 
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and Gallager M. On grounds of simplicity I shall model a conceptually 
similar but simpler protocol known as ALOHAnet. ALOHAnet was one of 
several Ethernet precursors, developed at the University of Hawaii during 
the 1970's. Although it has long since been superseded, it lives on in the 
form of a tractable mathematical model. The stochastic ALOHAnet model 
is a discrete-time model or Markov chain, unlike the continuous-time models 
which must be employed in the performance analysis of real-world Ethernets. 
The following description is standard || [L3|, 

Suppose that N computers are attached to the network; N will eventually 
be taken to infinity, yielding a continuum limit which (if proper scaling is 
imposed) can be viewed as a weak-noise limit. At each integer time j = 
1,2,3,... a packet of data originates with probability po on each computer 
not currently blocked. When is a computer blocked? When a previously 
generated packet has failed to be transmitted successfully, and the packet is 
awaiting retransmission. 

Newly generated packets are always transmitted immediately, but of 
course they may collide with packets transmitted by other computers at 
the same integer time. Such collisions are immediately detected, and each 
of the transmitting computers enters a blocked state (if it was not blocked 
already). While in the blocked state, at each subsequent integer time a com- 
puter will attempt a retransmission with probability p%. In other words each 
of the blocked computers backs off a random amount of time, and tries again 
to transmit its packet. The backoff time is geometrically distributed, with 
parameter p\. This random backoff policy facilitates the breaking of the 
deadlock: if the blocked computers each backed off a fixed amount of time, 
they would simply run into each other again. 

This ALOHAnet model has only three parameters: po, p%, and N. If yj 
is the number of computers blocked at time j, then yi, y 2 , 2/3 • • • is a Markov 
chain on the discrete state space {0, 1, 2, ... , N}. Let us analyse this Markov 
chain. 

At any time j, the number of retransmitted packets is binomially dis- 
tributed, with parameters p\ and yj. Similarly, the number of newly gen- 
erated (and transmitted) packets is binomially distributed with parameters 
Po and N — yj. If X\ and X denote these two random variables, the total 
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number of packets transmitted at integer time j is X\ + Xq, and 



f-1, aX = Q,X 1 = l; 

Z = y j+1 -y j= lX , ifX +X 1 >l; (1) 

1 0, otherwise. 

Uj will decrease by 1 if a previously unsuccessfully transmitted packet (and 
only that packet) is retransmitted. It will increase by X in the event of a 
collision, and so forth. From (|l|), it is easy to work out the density of the 
random variable £ = Ay. 

Since we wish to construct a continuum large- iV limit we define the nor- 
malized network state x at any time to be y/N, the fraction of computers 
that are currently blocked. Necessarily < x < 1. Besides scaling the state 
space in this way, we scale time by defining normalized time t to equal j/N, 
so that x, if viewed as a function of t, jumps at t — 1/N,2/N, ... by a ran- 
dom quantity N^ 1 ^. The density of the random variable £ is specified by the 
current normalized state x; we write £ as £(x) to make this clear. 

To get a nontrivial large-iV limit we need to scale the probabilities po 
and pi as well; we take p = qo/N and p\ = qi/N, for some N- independent 
go and q\. So q$x is the expected number of newly generated packets, and 
qi(l — x) the expected number of retransmitted packets, at any specified 
normalized time j/N. It is an easy exercise to verify that in the large- N 
limit 

(£0*0) = ?o(l -x)- [q (l -x)+ qix] exp [-q Q (l - x) - q x x\ (2) 

is the expected change in the number of blocked computers, at any specified 
time j/N. The formula (0) gives us an explicit expression for (Ax), the mean 
amount by which the normalized state x changes at any specified time j/N; 
it is simply iV" 1 • So in the large- N limit the dynamics of our network 
model are in expectation completely specified by (0). 

We can now see how the ALOHAnet model can be viewed as a stochasti- 
cally perturbed dynamical system. In expectation, the large- iV ALOHAnet 
model looks very like a one-dimensional dynamical system 

x(t) = (£(x)). (3) 

defined on the closed interval [0, 1]. Such an associated deterministic dynam- 
ical system is called a fluid approximation by network performance analysts. 
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Although (as we shall see) it cannot answer the questions about large fluc- 
tuations in which we are interested, the fluid approximation says quite a bit 
about the stability of the network. In Fig. 1, the drift field is plotted 

as a function of x, for q = 0.43 and q\ = 5.0 (parameter values originally 
chosen by Giinther and Shaw ||). It is clear that for this choice of parame- 
ters the system has two point attractors: Xq ~ 0.150 and X\ ~ 0.879. Each 
has its own basin of attraction, and in the fluid approximation the network 
state flows deterministically to one or the other. The two attractors are in- 
terpreted as follows. Networks, in particular heavily loaded networks, are 
prone to congestion, and the two attractors are respectively a low-congestion 
and a high-congestion state. 

The presence of more than a single attractor, for certain parameter val- 
ues, is an unfortunate feature of the ALOHAnet protocol. If at time zero 
all computers begin unblocked, with these parameter values the fraction of 
blocked computers will swiftly rise to ~ 0.150. If on the other hand at time 
zero the computers all begin in the blocked state, the fraction will decrease 
to ~ 0.879 and no further. In the latter case very few packets are success- 
fully transmitted or retransmitted, since the probability of more than a single 
computer transmitting a packet is always very high. (Since q± = 5.0, when 
x ~ 1 about 5 computers, on average, attempt to retransmit a packet at 
each time j/N.) The ALOHAnet protocol makes no provision for breaking 
the deadlock by sharing the network in a sequential or round-robin fashion: 
in the event of extreme congestion, the computers get in each others' way. 

The appearance of more than a single point attractor is actually a bit 
atypical; it will occur only for certain values of the scaled parameters. (See Fig. 2.) 
The (gcb <?i)-plane is divided into two regions: a monostable (one-attractor) 
region, and a bistable (two-attractor) region. The equilibrium blocking frac- 
tion is a single- valued function of (go, Qi) in the former region, and a double- 
valued function in the latter. Nelson |l£| has shown that this phenomenon, 
which is so suggestive of statistical-mechanical critical behavior, generalizes 
naturally to multidimensional parameter spaces. The Ethernet protocol mod- 
ifies the packet retransmission probability each time an unsuccessful retrans- 
mission occurs, so a more realistic ALOHAnet model would be specified by 
a vector (po,Pi,P2, ■ ■ •) of probabilities, with p k , k > 1, the probability of 
transmitting a packet which has failed to be successfully transmitted ex- 
actly k times. The corresponding normalized system state would be a vector 
(x^\ x@\ . . .) of blocking fractions: x^ k \ k > 1, would be the fraction of com- 
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puters which are blocked and which have failed to transmit a stored packet 
exactly k times. The analogue of Fig. 2 would be a multidimensional phase 
diagram, some regions in which would be characterized by the presence of 
multiple point attractors in the multidimensional normalized state space. 

The preceding treatment has been entirely in the context of the deter- 
ministic fluid approximation. The network state does not actually evolve de- 
terministically, except in expectation. The expected increment (Ax) equals 
N~ 1 (^(x)), but the standard deviation of Ax is also proportional to N^ 1 . 
Ax equals (Ax) plus Ax — (Ax), and the latter term can be viewed as 
a stochastic perturbation superimposed on the dynamical system. These 
stochastic perturbations will broaden the point attractors into noisy attrac- 
tors, and occasionally induce transitions between them. 

These transitions are of considerable practical interest, since they are sud- 
den changes in network congestion. A heavily loaded network can suddenly 
shift from a low-congestion state to a high-congestion state, in which almost 
no packets are transmitted successfully. (This has rather drastic effects on 
the computers attached to the network!) But to model such transitions, 
a fully stochastic treatment is necessary. 



3 Wentzell-Freidlin Theory 

The techniques employed to estimate the transition time between metastable 
states, and in general to estimate the probability of unlikely events in the 
weak-noise limit, go under the name of Wentzell-Freidlin theory pq| . Wentzell- 



Freidlin theory is simply the large deviation theory of stochastically per- 
turbed dynamical systems. Many results in this area are due to physicists 



and chemists R 1231, p5[, but Wentzell and Freidlin were the first to put the 



subject on a sound mathematical footing 0, [27]. I shall summarize their 
main results, and extensions. 

Consider a multidimensional random process x(£) similar to the normal- 
ized ALOHAnet process. x(t) is assumed to jump at times t = N~ l , 2N~ l , 3N~ 
and the jump magnitude is iV -1 times a random vector whose distribution 
depends on the current state x. We write this random vector as £(x), so 
Ax = A^ _1 ^(x). The N — > oo limit will be a weak-noise limit. 

This random process strongly resembles a diffusion process with drift. 
In fact the expected drift velocity at any point x is u(x) = (£(x)), and 
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the diffusion tensor is A -1 times Djj(x) = Cov(£j(x), £j(x)), the covariance 
matrix of the components of £(x). A continuous-time diffusion process x(t) 
with these parameters would satisfy the stochastic differential equation 

dxm = uMt)) + E ^p-d Wj (t) (4) 

where dw(t) is white noise, and the tensor a = (<Jij) is related to the ten- 
sor D = (Dij) by D = aa 1 . But this continuous-time 'diffusive approxima- 
tion' to the underlying jump process is not especially useful for our purposes: 
the large fluctuations of the jump process turn out to depend crucially on 
the higher moments of £(x). 

Suppose that xo is an attractor for the expected drift field u(x). Then 
in expectation x(t) will tend to flow toward x if it begins in the basin of 
attraction of x . Thereafter, x(t) will tend to wander near x for a long 
time. But statistical fluctuations of all magnitudes will occur; the stochastic 
perturbations A _1 [£(x) — u(x)] will eventually push x outside any specified 
region U surrounding x . In other words, the noise will eventually overcome 
the drift. 

Since the effective diffusion coefficient decays as A^ 1 , one expects that 
the time to exit any specified region U grows (in expectation) exponentially 
in A. That is correct, and Wentzell-Freidlin theory provides a technique 
for computing the asymptotic exponential growth rate. This will of course 
depend on the choice of U. In most applications U is the entire basin of 
attraction of the attractor x , though a smaller region could be chosen. 

The technique is as follows. According to theory the expected exit time (t cxit ) 
has weak-noise asymptotics 

(t^t) ~ exp(A^o), A -> oo (5) 

where 

<S = inf f L(y:(t),±(t))dt (6) 

is a minimum action for exiting trajectories. The infimum is taken over all 
trajectories x(i) which begin at x and terminate on the boundary of U. The 
transit time is left unspecified. Here L(x, x) is a Lagrangian function, dual to 
a Hamiltonian or energy function constructed from the distribution of £(x) 
by the formula 

tf(x,p)=log(exp(p.£(x))>. (7) 
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It is clear that the higher moments of £(x) enter into the computation of 
the function H. H(x, •) is in fact the cumulant generating function of the 
random variable £(x). 

The sudden appearance of a classical Hamiltonian and its dual Lagrangian 
is quite remarkable. They are not mere mathematical auxiliaries. The tra- 
jectory x*(i) minimizing the action (it usually exists, and is unique) is in- 
terpreted as the most probable exit path (MPEP) in the limit of weak noise. 
It is not difficult to check, using standard methods of classical mechanics, 
that the optimization of the action over transit times yields an MPEP which 
is a classical trajectory of zero energy. So the 'momentum' p, which has no 
direct physical interpretation, as a function of position x along the MPEP 
must satisfy 

(exp(p -£(x))) = 1. (8) 

If the state space is one-dimensional, this zero-energy constraint alone will 
determine the MPEP. 

The MPEP x* is not only a most probable exit path: it is also an exit path 
of least resistance. Although x(t) will remain in U for an exponentially long 
time, it will fluctuate out along the MPEP (and in other directions) an expo- 
nentially large number of times before the MPEP is traversed in full and U is 
exited. The final fluctuation will follow x* quite closely in the large- N limit. 
One can view the equilibrium distribution of the system state x (the noisy 
attractor) as being concentrated near x , but having a tube-like protuber- 
ance stretching out toward the boundary of U along the trajectory x*. In the 
large- N limit the tube is exponentially suppressed, and the noisy attractor 
converges to the point attractor x . 

(£exit) grows exponentially in N, but the limiting distribution of t exit has 
not yet been specified. It turns out to be an exponential distribution. This 
is very typical of weak-noise escape problems, where the probability of any 
single escape attempt is small. (The same exponential distribution is seen in 
radioactive decay.) 

So, the weak-noise growth rate of the expected exit time, can be viewed as 
a barrier height: a measure of how hard it is to overcome the drift driving x 
toward xo and away from the boundary of U. In fact the Wentzell-Freidlin 
framework, if extended to conservative continuous-time processes described 
by OD) , yields the familiar Arrhenius law for the growth of the exit time in the 
limit of weak noise. For such systems Sq is simply the height of the potential 
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barrier surrounding the attractor. 

What is not clear from the Wentzell-Freidlin treatment (and is still not 



rigorously clear, though numerous nonrigorous results have been obtained \Ti 
|T6| , pH ) is the subdominant large- N asymptotics of (t e xit) - In general one ex- 
pects 

(t exit ) ~ CN a exp(NS ), N -> oo, (9) 

for some constants C and a, but Wentzell-Freidlin theory yields only the 
exponential growth rate Sq. The pre-exponential factor in (|9|) remains to be 
determined. 

The current status of the prefactor problem can be summed up as follows. 
If U is taken to be the entire basin of attraction of x , a is typically zero, 
and C can be obtained by a method of matched asymptotic expansions, 
i.e., a method of systematically approximating the equilibrium distribution 
of x. However in multidimensional models there is an entire zoo of possible 
pathologies, including the appearance of caustics and other singular curves 



in the state space |2], [TJ|, which can induce a nonzero a and/or hinder a 
straightforward computation of C. This is the case, at least, for continuous- 
time diffusion processes defined by stochastic differential equations. The 
situation for jump processes is expected to be similar. 



4 Applying the Theory 

Wentzell-Freidlin theory, with extensions, can be applied to the stochastic 
ALOHAnet model, and to other stochastically perturbed dynamical systems 
arising in computer science. The quantity most readily computed is So, the 
exponential growth rate in the weak-noise limit of the expected time before 
the system leaves a specified region surrounding a point attractor in the 
system state space. Recall that in the ALOHAnet model this region is the 
basin of attraction; departure from it signals a drastic change in network 
congestion. 

If the system state space is one-dimensional, as in the ALOHAnet model, 
the classical-mechanical interpretation of Sq facilitates its computation. Sq is 
always the action of a zero-energy trajectory, with energy as a function of 
position and momentum given by the formula (^). This Hamiltonian is a 
convex function of p at fixed x, so if the state space is one-dimensional 
(and (£(x)) 7^ 0, which will always be the case within the basin of attraction) 
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the equation if (x, p) = will have only two solutions for p = p(x). One 
of these is p = 0, which is unphysical. This solution is unphysical because 
if p = 

dH _ (C(x) exp(p • e(x))) _ 
X "9 P - (ex P ( P • £(x))) "^ X)) (1U) 

and the p = trajectory simply follows the mean drift, which points toward 
the attractor rather than away. The MPEP must be a classical trajectory 
emanating from the attractor, so in a one-dimensional system it is uniquely 
characterized by the condition that p = p(x) be the nonzero solution of 
iJ(x, p) = 0. Actually there are two such trajectories, one emanating to 
either side of the attractor; the true MPEP will be the one with lesser action. 

In general to compute So, even in higher-dimensional models one needs 
only the MPEP and the momentum as a function of position along it. This 
is because the action of any zero-energy classical trajectory may be written 
clS cL line integral of the momentum, so that 

S = J p(x)-dx (11) 

the integral being taken along the MPEP from the attractor to the boundary 
of the region. But only in one-dimensional models is ([H]) easy to apply. In d- 
dimensional models merely finding the MPEP requires an optimization over 
the (d — l)-dimensional family of zero-energy trajectories extending to the 
boundary. Except in models with symmetry, this optimization must usually 
be performed numerically. 

4.1 The ALOHAnet Application 

In the ALOHAnet model, the expected drift as a function of normal- 

ized network state x is given by (0). But to study large fluctuations, and com- 
pute the MPEP, one needs the Wentzell-Freidlin Hamiltonian log(exp(p£(x))). 
In the large- N limit the random variables X\ and X , in terms of which £ is 
expressed by (H), become respectively a Poisson random variable with pa- 
rameter qix and a Poisson random variable with parameter q (l — x). A bit 
of computation yields 

H{x,p) = log [ e ?o(i-^(ef-i) + qQ ^ _ x ) e -*>-9i*(i _ e P) + q lX e- q °- qiX {e- p - 1) 

(12) 
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as the Hamiltonian. 

If the parameters go and gi are known, it is easy to compute the momen- 
tum p = p{x) along the MPEP, by numerically solving for the nonzero solu- 
tion of the implicit equation H(x,p(x)) = 0. But the MPEP, and hence <S , 
will depend on the choice of basin of attractor. With the parameter values 
go = 0.43 and gi = 5.0 of Fig. 1, the two attractors Xq ~ 0.150 and x\ m 0.879 
have respective basins of attraction [0, x c ) and (x c , 1], with x c ~ 0.278 the 
intermediate repellor. MPEPs extend from xo to x c , and from x\ to x c . 
Numerical integration of p(x) gives 

S [x -> x c ] w 0.00177 (13) 
S [x! -> x c ] « 0.014 (14) 

as the growth rates of the expected transition times. 

We see that for the stochastically modelled ALOHAnet, in the large- N 
limit a reduced description is appropriate. Asymptotically, it becomes a two- 
state process. The network is either in a low-congestion state (the basin of 
attraction of xo) or a high-congestion state (the basin of attraction of x±), and 
the transition rates between them (the reciprocals of the expected transition 
times) display exponential falloffs 

exp (-NS [x -> x c }) , exp (-7V<So[zi -> x c }) (15) 

respectively. With the above choice of parameters, for reasonable-sized N 
the latter transition rate is much smaller than the former. The network, 
once congestion has interfered with the proper performance of the backoff 
algorithm, gets 'stuck' for potentially a long time. This is clearly not a good 
choice of network parameters! 

In a real-world A^-computer ALOHAnet implementation, g would be the 
total network load, and would be determined by the level of interprocessor 
computing taking place on the network. The backoff parameter qi = Npi 
however would probably be fixed, with pi hardcoded in a data communi- 
cations chip installed in each computer. So the Wentzell-Freidlin approach 
could be employed to determine the likelihood, as a function of network load, 
of irreversible (or all but irreversible) congestion occurring. 

Of course the bistability of the system is itself a function of go and gi. 
As noted, for many values of the parameters the network is monostable: there 
is only a single attractor, which may be characterized by a comparatively low 
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level of congestion. For such a network one could compute an action Sq for 
any specified maximum tolerable congestion level. The associated optimal 
(i.e., most probable) approach path would be computed much as the MPEP 
is computed in the bistable case. 



4.2 A Colliding Stacks Application 

There have been several applications of large deviation theory to the stochas- 



tic modelling of dynamic data structures ||10| , pT|, [12| . The memory usage of 
a program or programs being executed by a computer can be modelled as 
a discrete-time jump process. In many cases this process may be viewed 
as a finite-dimensional dynamical system, subject to small stochastic per- 
turbations. Of interest is the amount of time expected to elapse before a 
particularly large fluctuation away from a deterministic point attractor oc- 
curs. This would correspond, in real-world terms, to an atypical string of 
memory allocations leading to an exhaustion of memory. 

The following two-dimensional 'colliding stacks' model was first studied 
by Flajolet |§, having been first suggested by Knuth. Suppose that N cells 
of memory, arranged in a linear array, are available for use by two programs. 
Suppose that at any given time, the programs will require y^ and y^ cells of 
memory respectively. It will be most efficient for them to employ respectively 
the first y' 1 ' and the last y^ cells of the array, so as to avoid contention for 
memory. It is necessary that y' 1 ' + y^ < N; if this inequality becomes an 
equality, the two-program system runs out of memory. 

A natural model for the evolution of y^ 1 ' and y^ is as follows. At any 
integer time j = 1,2,3,.. ., there are four possibilities: y^ may increase 
by 1, yW may decrease by 1, y™ may increase by 1, and y^ may decrease 
by 1. These are assigned probabilities p/2, (1— p)/2,p/2, (l—p)/2, for p the 
probability of a net increase in memory usage. Let us take < p < |, so that 
deallocations of memory are more likely than new allocations. (Note that if 
yM = or yW = the assigned probabilities must differ, since neither y { > 
nor y^ 2 ' can go negative.) 

Just as in the ALOHAnet model, it is natural to scale both time and 
and the state space as the amount of memory N tends to infinity. However, 
we shall not need to scale the model parameter p. Let x = (x\,X2) = 
(jji 1 ) ; y( 2 )) J TV be the normalized state of the two-program system, and let 
t = j/N be normalized time, x jumps at t = 1/N, 2/N, 3/N, . . . by an 
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amount iV where £ is a random variable with discrete density 




T- uo : , del 



As defined, the density of £ is essentially independent of x. It is useful to 
relax this assumption, so as to permit more realistic stochastic modelling of 
dynamic data structures. 



fp(xi)/2, ifz = (l,0); 

pM/2, ifz = (0,l); 



p{£(x) = z} = { (17) 

(1 -p(x 1 ))/2, if z = (-1,0); 
[(l-p(x 2 ))/2, if z = (0,-1) 

is a natural generalization. Here p(x) (assumed to take values between 
and | exclusive) specifies the probability of an increase in memory us- 
age by either program, as a function of the fraction of available memory 
which that program is currently using. We now write £ as £(x), to indicate 
the dependence of its density on x. 

The normalized state x is confined to the right triangle with vertices 
(0,0), (1,0) and (0,1). The expected drift 

(e(x)> = (p(x 1 )-|,p(x 2 )-i) (18) 

may be viewed as a deterministic dynamical system on this two-dimensional 
normalized state space. Clearly, the vertex (0, 0) is the global attractor. 
In this model the two programs tend on the average not to use much memory. 

Since there is only a single attractor, the quantity of interest is the ex- 
pected time which must elapse before a fluctuation of specified magnitude 
occurs. Fluctuations which take the system state to the hypotenuse of the 
triangle (where x± + X2 — 1, or + y^ = N) are fatal: they correspond to 
memory exhaustion. The rate at which they occur can be estimated in the 
large- N limit. 

This is a two-dimensional system, so the optimal (least-action) trajecto- 
ries are not determined uniquely by the zero-energy constraint. However we 
still have 

(toxit) -exp(iVSo), N^oo (19) 
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with (So the action of the least-action trajectory which exits the triangle 
through the hypotenuse. The action is computed from the Lagrangian dual 
to the Wentzell-Freidlin Hamiltonian 

tf(x,p) = log(exp(p • £(x))) (20) 
= — log 2 + log{coshp 2 . — [1 — 2p(x)] smhp x 
+ coshpj, — [1 — 2p(y)] smhpy}, 

which follows from (pp. 

The zero-energy trajectories determined by ( p0[) are studied at length in 



Ref . [TT[ , where it is shown that the MPEP depends strongly on the behavior 
of the function p(x). (See Fig. 3.) If p(x) is a strictly decreasing function, 
so that the model is 'increasingly contractive,' with large excursions away 
from the attractor strongly suppressed, then the MPEP turns out to be 
directed along the line segment from (0, 0) to (|, \). Its action is 

S = 4 / V2 tanh~ 1 [l-2p(x)]rfx. (21) 

Jx=0 

If on the other hand p(x) is a strictly increasing function, so that the model 
is decreasingly contractive, with large excursions less strongly suppressed, 
then there is a twofold degeneracy. MPEPs are directed outward from (0, 0) 
to the two other vertices of the triangle, and 

S = 2 f 1 tanh" 1 !! - 2p{x)\ dx (22) 

Jx=0 

is their common action. 

So when p(x) is strictly increasing, there is a 'hot spot' on the hypotenuse 
of the triangle at (|, |). When the two-program system runs out of memory, 
as N — ► oo it is increasingly likely that each program will be using approx- 
imately N/2 memory cells. If on the other hand p(x) is strictly decreasing, 
there are hotspots at the vertices (0,1) and (1,0). Exhaustion increasingly 
tends to occur when one or the other program is using all, or nearly all, of the 
iV memory cells. 

If p(x) is neither strictly increasing nor strictly decreasing, the large- N 
asymptotics may become more complicated. The most easily treated case is 



that of p(x) = p, a constant, i.e., the model of (|16|). In this model an infi- 



nite degeneracy occurs: any trajectory which moves some distance (possibly 
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zero) from (0, 0) toward (0, 1) or (1, 0), and then moves into the interior of the 
triangle at a 45° angle until it reaches the hypotenuse, is a least-action tra- 
jectory. Large fluctuations away from the attractor may proceed along any 
of this uncountable set of MPEPs. As a consequence there is no hotspot: 
in the large- iV limit, the exit location is uniformly distributed over the hy- 
potenuse. Flajolet first discovered this phenomenon combinatorially, but 
it has a natural classical-mechanical interpretation. It is however a bit coun- 
terintuitive: it says that when memory is exhausted, the fractions allocated 
to each program are as likely to be small as large. This is a very sensitive 
phenomenon. 



5 Conclusions 

We have seen that the Wentzell-Freidlin results on scaled jump processes 
throw considerable light on the fluctuations of stochastically perturbed dy- 
namical systems, in the weak-noise limit. The appearance of a classical 
Hamiltonian and Lagrangian, even if the unperturbed dynamical system is 
in no sense Hamiltonian, is quite striking. So is the central importance of 
zero-energy trajectories. 

In this lecture I have focused on jump processes since they are the most 
relevant to computer science applications. (Computing is inherently dis- 
crete.) But they also occur in chemical physics: there is always an integer 
number of molecules in any given region of space. Attempts are now be- 
ing made to interpret the stochastic aspects of chemical reactions in terms 
of optimal trajectories |2T|. This is very reminiscent of our focus on most 
probable exit paths (MPEPs). 

There is also a large deviation theory of continuous-time processes [||, 
26fl , such as the diffusion processes specified by the stochastic differential 



equation (Q). Associated to each such process is a Fokker-Planck equation 
(a parabolic partial differential equation) describing the diffusion of probabil- 
ity. The zero-energy classical trajectories of continuous-time large deviation 
theory can be viewed as the characteristics of this differential equation. Nor- 
mally one expects only hyperbolic equations to have characteristics, but these 
characteristics are emergent: they manifest themselves only in the weak-noise 
limit. 

A large deviation theory of spatially extended systems would be an in- 
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teresting extension, but is still under development. Such systems include 
stochastic partial differential equations and stochastic cellular automata. 
In such systems a MPEP would be a trajectory in the system state space, 
describing a most probable spatially extended fluctuation leading from one 
metastable state to another. Much work has been done on this by statistical 
mechanicians and field theorists (who call such fluctuations 'instantons' ||22|| ) , 
but the theory is less complete than the theory I have sketched in this lec- 
ture. The theory of extended fluctuations has in particular not been applied 
to distributed computer systems. There is clearly much left to be done! 
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Figure 1: The expected drift velocity (£(x)) of the stochastic ALOHAnet 
model, as a function of normalized network state x. Model parameters are 
q = 0.43 and q\ = 5.0, as in Ref. ||. 



Figure 2: An impressionistic sketch of the parameter space of the stochastic 
ALOHAnet model. Within the horn-shaped region the network is bistable; 
outside it, monostable. The tip of the horn is analogous to a statistical- 
mechanical critical point. 



Figure 3: The triangular normalized state space of the colliding stacks model. 
Trajectory Tl is the most probable exit path when the function p(x) is strictly 
decreasing, but if p(x) is strictly increasing then T2 and T2' are both MPEPs. 
Trajectory T3 is one of the uncountably many MPEPs which arise when p(x) 
is independent of x. 
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